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Aims and Motivation for the Course 
We aim to: 


e Develop a theory which can characterize the behavior of real-world 
Random Signals and Processes; 
e Use standard Probability Theory for this. 


Random signal theory is important for 


e Analysis of signals; 

e Inference of underlying system parameters from noisy observed data; 

e Design of optimal systems (digital and analogue signal recovery, 
signal classification, estimation ...); 

e Predicting system performance (error-rates, signal-to-noise ratios, ...). 


Example: 

Speech signals 

Use probability theory to characterize that some sequences of vowels and 
consonants are more likely than others, some waveforms more likely than 
others for a given vowel or consonant. Please see [link]. 

Use this to achieve: speech recognition, speech coding, speech 
enhancement, ... 
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Four utterances of the vowel sound 'Aah'. 


Example: 

Digital communications 

Characterize the properties of the digital data source (mobile phone, digital 
television transmitter, ...), characterize the noise/distortions present in the 
transmission channel. Please see [link]. 

Use this to achieve: accurate regeneration of the digital signal at the 
receiver, analysis of the channel characteristics ... 
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Digital data stream from a noisy 
communications Channel. 


Probability theory is used to give a mathematical description of the 
behavior of real-world systems which involve elements of randomness. 
Such a system might be as simple as a coin-flipping experiment, in which 


we are interested in whether 'Heads' or 'Tails' is the outcome, or it might be 


more complex, as in the study of random errors in a coded digital data 
stream (e.g. a CD recording or a digital mobile phone). 


The basics of probability theory should be familiar from the IB Probability 
and Statistics course. Here we summarize the main results from that course 
and develop them into a framework that can encompass random signals and 


processes. 


Probability Distributions 


The distribution Px of a random variable X is simply a probability measure 
which assigns probabilities to events on the real line. The distribution Px 
answers questions of the form: 


What is the probability that X lies in some subset F’ of the real line? 


In practice we summarize Px by its Probability Mass Function - pmf (for 
discrete variables only), Probability Density Function - pdf (mainly for 
continuous variables), or Cumulative Distribution Function - cdf (for 
either discrete or continuous variables). 


Probability Mass Function (pmf) 


Suppose the discrete random variable X can take a set of VW real values 
{x1,..-,£u}, then the pmf is defined as: 
Equation: 


px(vi) = Pr[X =a; 
Px({xi}) 


where peu px(a;) = 1. e.g. For a normal 6-sided die, M = 6 and 
px(#i) = - For a pair of dice being thrown, M = 11 and the pmf is as 
shown in (a) of [link]. 
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(b) caf of 2-dice throws 
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(c) pdt of 2-dice throws 
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Examples of pmfs, cdfs and pdfs: (a) to (c) for a 
discrete process, the sum of two dice; (d) and (e) 
for a continuous process with a normal or 
Gaussian distribution, whose mean = 2 and 
variance = 3. 


Cumulative Distribution Function (cdf) 


The cdf can describe discrete, continuous or mixed distributions of X and is 


defined as: 
Equation: 


Fy(a) = Pr[x <a 
= Px((—ov,#]) 


For discrete X: 
Equation: 


F(z) = >_{px(2s) 2, <2} 


giving step-like cdfs as in the example of (b) of [Link]. 
Properties follow directly from the Axioms of Probability: 


1.0 < F(a) < 1 

2. Fx(—co) = 0, Fx(oo) == 1 

3. Fx (a) is non-decreasing as x increases 
A. Pri <A < r2| = Fx(2x2) — Fyx(21) 
5. Pr[X > «] =1—- Fx(z) 


where there is no ambiguity we will often drop the subscript X and refer to 
the cdf as F(x). 


Probability Density Function (pdf) 


The pdf of X is defined as the derivative of the cdf: 
Equation: 


The pdf can also be interpreted in derivative form as 6(x) — 0: 
Equation: 


fx(x)6(z2) = Prlan< X<2+6(z)| 
= Fx(a+ 6(x)) — Fx(2) 


For a discrete random variable with pmf given by px(a;): 
Equation: 


An example of the pdf of the 2-dice discrete random process is shown in (c) 
of [link]. (Strictly the delta functions should extend vertically to infinity, but 
we show them only reaching the values of their areas, px(2;).) 


The pdf and cdf of a continuous distribution (in this case the normal or 
Gaussian distribution) are shown in (d) and (e) of [link]. 


Note:The cdf is the integral of the pdf and should always go from zero to 
unity for a valid probability distribution. 


Properties of pdfs: 


1. fx(x) > 0 

2] Jue deal 

3. Fx(z) = f° fx(a) da 

A.Priay =X <4] = i fx(a) da 


As for the cdf, we will often drop the subscript X and refer simply to f(z) 
when no confusion can arise. 


Conditional Probabilities and Bayes' Rule 
If A and B are two separate but possibly dependent random events, then: 


1. Probability of A and B occurring together = Pr/A, B] 
2. The conditional probability of A, given that B occurs = Pr[A | B] 
3. The conditional probability of B, given that A occurs = Pr[B | A] 


From elementary rules of probability (Venn diagrams): 
Equation: 


Pr[A, B] = Pr[A| B]Pr[B] 
= A 


Dividing the right-hand pair of expressions by Pr|B] gives Bayes' rule: 
Equation: 


Pr[B | A] Pr[A] 


Pr[4 | B|= 


In problems of probabilistic inference, we are often trying to estimate the 
most probable underlying model for a random process, based on some 
observed data or evidence. If A represents a given set of model parameters, 
and B represents the set of observed data values, then the terms in [link] are 
given the following terminology: 


e Pr[A] is the prior probability of the model A (in the absence of any 
evidence); 

e Pr{|B] is the probability of the evidence B; 

e Pr|B | A| is the likelihood that the evidence B was produced, given 
that the model was A; 

e Pr[A | B] is the posterior probability of the model being A, given 


that the evidence is B. 


Quite often, we try to find the model A which maximizes the posterior 
Pr[A | B]. This is known as maximum a posteriori or MAP model 


selection. 


The following example illustrates the concepts of Bayesian model selection. 


Example: 

Loaded Dice 

Problem: 

Given a tub containing 100 six-sided dice, in which one die is known to be 
loaded towards the six to a specified extent, derive an expression for the 
probability that, after a given set of throws, an arbitrarily chosen die is the 
loaded one? Assume the other 99 dice are all fair (not loaded in any way). 
The loaded die is known to have the following pmf: 


prAh) — 0,05 
{Pi 2)y 115) — 0.15 
p,(6) = 0.35 


Here derive a good strategy for finding the loaded die from the tub. 
Solution: 
The pmfs of the fair dice may be assumed to be: 


ie ela als (a0 = 3) 


Let each die have one of two states, S = L if it is loaded and S = F if it is 
fair. These are our two possible models for the random process and they 
have underlying pmfs given by {pz(1),...,pz(6)} and 
{pr(1),...,pr(6)} respectively. 

After NV throws of the chosen die, let the sequence of throws be 

On = {61,...,9n}, where each 6; € {1,...,6}. This is our evidence. 
We shall now calculate the probability that this die is the loaded one. We 
therefore wish to find the posterior Pr/S = L | Oy]. 

We cannot evaluate this directly, but we can evaluate the likelihoods, 
Pr[Oy | S = L] and Pr[Oy | S = F', since we know the expected pmfs 


in each case. We also know the prior probabilities Pr[/S = L] and 

Pr[S = F] before we have carried out any throws, and these are 

{0.01, 0.99} since only one die in the tub of 100 is loaded. Hence we can 
use Bayes' rule: 

Equation: 


Pr[Oyw | S = L| Pr|S = L| 


Pr Se 6 i 
| | On! Pr[Oy]| 

The denominator term Pr[Oy] is there to ensure that Pr[S = L | Oy] and 

Pr[S = F | Oy] sum to unity (as they must). It can most easily be 

calculated from: 


Equation: 
Pr(Oy| = PrlOnv,S = L]+ Pr[On, S = F] 
= PrlOy | S = L| Pr|S = L|+ Pr[Oy | S = Fl] Pr[S = F] 

so that 

Equation: 

we ss Pr[On | S=L] Pr[S=L] 
PriS=L|On| = Pr[Oy | S=L] Pr|S=L]+Pr/Oy | S=F] Pr/S=F) 
1 
a 14+Ry 
where 
Equation: 


ae PrlOy | S = F] Pr[S = F] 
‘ Pr[On | S = L]Pr[S = L] 

To calculate the likelihoods, Pr{Oy | S = L] and Pr|Oy | S = F), we 

simply take the product of the probabilities of each throw occurring in the 

sequence of throws Oy, given each of the two modules respectively (since 

each new throw is independent of all previous throws, given the model). 

So, after N throws, these likelihoods will be given by: 

Equation: 


and 
Equation: 
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We can now substitute these probabilities into the above expression for Ry 
and include Pr|.S = L] = 0.01 and Pr[S = F' = 0.99 to get the desired a 
posteriori probability Pr[|S = LD | Ox] after N throws using [link]. 

We may Calculate this iteratively by noting that 


Equation: 
Pron) | — 2) PrOne iS — 2.0) 
and 
Equation: 
Pr[On | S = F] = PrlOn_i | S = Flpr(0,.) 
so that 
Equation: 
Pr(9n) 
Ry = Ry-1 a 
where Ry = — = 99. If we calculate this after every throw of the 


current die being tested (i.e. as NV increases), then we can either move on to 
test the next die from the tub if Pr[S = L | Oy] becomes sufficiently 
small (say < (10~*)) or accept the current die as the loaded one when 
Pr[S = L | On] becomes large enough (say > (0.995)). (These 
thresholds correspond approximately to Ry > 10* and Ry <5 x 10°? 
respectively.) 


The choice of these thresholds for Pr|.S = L | Oy] is a function of the 
desired tradeoff between speed of searching versus the probability of 
failure to find the loaded die, either by moving on to the next die even 
when the current one is loaded, or by selecting a fair die as the loaded one. 
The lower threshold, p, = 10~‘, is the more critical, because it affects 
how long we spend before discarding each fair die. The probability of 
correctly detecting all the fair dice before the loaded die is reached is 

(1 — pi)" ~ 1 — npy, where n ~ 50 is the expected number of fair dice 
tested before the loaded one is found. So the failure probability due to 
incorrectly assuming the loaded die to be fair is approximately 

np; ~ 0.005. 

The upper threshold, pz = 0.995, is much less critical on search speed, 
since the loaded result only occurs once, so it is a good idea to set it very 
close to unity. The failure probability caused by selecting a fair die to be 
the loaded one is just 1 — pp = 0.005. Hence the 

overall failure probability = 0.005 + 0.005 = 0.01 


Note: In problems with significant amounts of evidence (e.g. large NV), the 
evidence probability and the likelihoods can both get very very small, 
sufficient to cause floating-point underflow on many computers if 
equations such as [link] and [link] are computed directly. However the 
ratio of likelihood to evidence probability still remains a reasonable size 
and is an important quantity which must be calculated correctly. 


One solution to this problem is to compute only the ratio of likelihoods, as 
in [link]. A more generally useful solution is to compute log(likelihoods) 
instead. The product operations in the expressions for the likelihoods then 
become sums of logarithms. Even the calculation of likelihood ratios such 
as Ry and comparison with appropriate thresholds can be done in the log 
domain. After this, it is OK to return to the linear domain if necessary since 
Ry should be a reasonable value as it is the ratio of very small quantities. 
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Probabilities of the current die being the loaded one as the 
throws progress (20th die is the loaded one). A new die is 
selected whenever the probability falls below pj. 


Surface plot of histograms of throws 
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Image plot of histograms of throws 
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Histograms of the dice throws as the throws progress. Histograms are 
reset when each new die is selected. 


Joint and Conditional cdfs and pdfs 


Cumulative distribution functions 


We define the joint cdf to be 
Equation: 


F(z,y) =Pri(X <2) A (Y <y)| 
and conditional cdf to be 
Equation: 


F(zly) = PriX<a2|Y<y] 


Hence we get the following rules: 


¢ Conditional probability (cdf): 
Equation: 


F(zly) 
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e Bayes Rule (cdf): 
Equation: 


¢ Total probability (cdf): 
Equation: 


F(a, co) = F(a) 


which follows because the event Y < oo itself forms a partition of the 
sample space. 


Conditional cdf's have similar properties to standard cdf's, i.e. 
F'xy(—ooly) = 0 


Fyy(ooly) =1 


Probability density functions 


We define joint and conditional pdfs in terms of corresponding cdfs. The 
joint pad is defined to be 
Equation: 


0° F(a, y) 


f(x,y) = Bx Oy 


and the conditional pdf is defined to be 


Equation: 
a dF(z|Y=y) 
_ d 
where 


Peay ay)=Prixy <2 Y=7 


Note that F''(x|Y = y) is different from the conditional cdf F(x|Y = y), 
previously defined, but there is a slight problem. The event, Y = y, has 
zero probability for continuous random variables, hence probability 
conditional on Y = y is not directly defined and F''(2|Y = y) cannot be 
found by direct application of event-based probability. However all is OK if 
we consider it as a limiting case: 

Equation: 


F'(2|Y =y) = limit Pr[X<a|y<Y<y+6(y)| 
5(y) 0 
== 1 : F(z,y+6(y))—F(z,y) 
~ mit (way) Fru) 


Joint and conditional pdfs have similar properties and interpretation to 
ordinary pdfs: 


fey) >0 


[ [ tewacay=1 


f(zly) > 0 


| tely) aa =1 


Note:From now on interpret { as {°° unless otherwise stated. 


For pdfs we get the following rules: 


¢ Conditional pdf: 
Equation: 


e Bayes Rule (pdf): 
Equation: 


fly|x) f(z) 


f(zly) = iC) 


¢ Total Probability (pdf): 
Equation: 


Sfylz)f(z)dzx = ffly,x2)da 


fly) f f(zly) da 
= fly) 


The final result is often referred to as the Marginalisation Integral 
and f(y) as the Marginal Probability. 


Random Vectors 


Random Vectors are simply groups of random variables, arranged as vectors. E.g.: 
Equation: 


X=(X, ... X,)’ 


where Xj, ... Xn are n separate random variables. 


In general, all of the previous results can be applied to random vectors as well as to 
random scalars, but vectors allow some interesting new results too. 


(a) pdf of 2-D normal distribution: mean = 0, var = 1 (b) pdf of Rayleigh distribution: var = 2 


pdfs of (a) a 2-D normal distribution and (b) a Rayleigh distribution, 
corresponding to the magnitude of the 2-D random vectors. 


Example - Arrows on a target 


Suppose that arrows are shot at a target and land at random distances from the 
target centre. The horizontal and vertical components of these distances are formed 
into a 2-D random error vector. If each component of this error vector is an 
independent variable with zero-mean Gaussian pdf of variance a7, calculate the 
pdf's of the radial magnitude and the phase angle of the error vector. 


Let the error vector be 
Equation: 


X=(X, X,)" 


X, and X92 each have a zero-mean Gaussian pdf given by 
Equation: 


Since X; and X92 are independent, the 2-D pdf of X is 
Equation: 


fx(21,%2) = f(#1)f(z2) 


In polar coordinates 
£1 = rcos(8) 
and 
rq = rsin(60) 
To calculate the radial pdf, we substitute r = J x12 + 292 in the above 2-D pdf to 


get: 
Equation: 


r+6(r) T 
ee eee) = / fei RAG dR 


where 


r+8(r) ptr a 1 2 i ee 
/ / fx(c1,02)RA 0d R~ or) [ e rdd=—re » d(r) 
r = oO 


_, 210? 


Hence the radial pdf of the error vector is: 
Equation: 


limit Pr[r<R<r+6(r)| 
6(r)0 


falr) = 8(r) 


ce 
= + re” 202 
a 


This is a Rayleigh distribution with variance = 207 (these are two components of 
X, each with variance o”). 


The 2-D pdf of X depends only on r and not on @, so the angular pdf of the error 
vector is constant over any 27 interval and is therefore 


fo(?) = on 


so that 


| fo) 20-1 


Random Signals 


Random signals are random variables which evolve, often with time (e.g. audio noise), but 
also with distance (e.g. intensity in an image of a random texture), or sometimes another 
parameter. 


They can be described as usual by their cdf and either their pmf (if the amplitude is discrete, 
as in a digitized signal) or their pdf (if the amplitude is continuous, as in most analogue 
signals). 


However a very important additional property is how rapidly a random signal fluctuates. 
Clearly a slowly varying signal such as the waves in an ocean is very different from a 
rapidly varying signal such as vibrations in a vehicle. We will see later in [link] how to deal 
with these frequency dependent characteristics of randomness. 


For the moment we shall assume that random signals are sampled at regular intervals and 
that each signal is equivalent to a sequence of samples of a given random process, as in the 
following examples. 


(a) Transmitted binary signal 
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(b) Filtered signal 


(c) Filtered nose 
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(d) Signal + nose at the detector 
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binary signal; (b) the binary signal after filtering with 
a half-sine receiver filter; (c) the channel noise after 
filtering with the same filter; (d) the filtered signal 
plus noise at the detector in the receiver. 


pdfs of signal + noise at the detector 


T 
data =+1 


T 
data =-1 


The pdfs of the signal plus noise at the detector for the two +(1). The 
vertical dashed line is the detector threshold and the shaded area to the left 
of the origin represents the probability of error when data = 1. 


Example - Detection of a binary signal in noise 


We now consider the example of detecting a binary signal after it has passed through a 
channel which adds noise. The transmitted signal is typically as shown in (a) of [link]. 


In order to reduce the channel noise, the receiver will include a lowpass filter. The aim of 
the filter is to reduce the noise as much as possible without reducing the peak values of the 
signal significantly. A good filter for this has a half-sine impulse response of the form: 
Equation: 


a(t) = * sin( 7) if 0<t<T, 
0 otherwise 


Where T;, = bit period. 


This filter will convert the rectangular data bits into sinusoidally shaped pulses as shown in 
(b) of [link] and it will also convert wide bandwidth channel noise into the form shown in 
(c) of [link]. Bandlimited noise of this form will usually have an approximately Gaussian 
pdf. 


Because this filter has an impulse response limited to just one bit period and has unit gain at 
zero frequency (the area under h(t) is unity), the signal values at the center of each bit 
period at the detector will still be +(1). If we choose to sample each bit at the detector at 
this optimal mid point, the pdfs of the signal plus noise at the detector will be shown in 
[link]. 


Let the filtered data signal be D(t) and the filtered noise be U(t), then the detector signal is 
Equation: 


R(t) = D(t) + U(t) 


If we assume that +(1) and —1 bits are equiprobable and the noise is a symmetric zero- 
mean process, the optimum detector threshold is clearly midway between these two states, 
ie. at zero. The probability of error when the data = +(1) is then given by: 

Equation: 


Prlerror | D=+(1)] = Pr[R(t) <0| D=-+(1)| 
Fy(—1) 
= ix fo(u) du 


| 


where Fy, and fy are the cdf and pdf of U. This is the shaded area in [link]. 


Similarly the probability of error when the data = —1 is then given by: 
Equation: 


| 


Prlerror | D= —1] Pr[R(t) >0| D=-—-1| 
Lor) 


i fu(u) du 


| 


Hence the overall probability of error is: 
Equation: 


Prierror] = Prierror | D = +(1)|Pr[D =+(1)] + Prlerror | D = —1]Pr[D = —1] 
Jo folu) d uPr[D = +(1)] + JP fo(u) duPr[D = -1] 


| 


since fy is symmetric about zero 


Prlerror| = I fu(u) du (Pr[D = +(1)| + Pr[D = -1]) = I fu(u) du 


To be a little more general and to account for signal attenuation over the channel, we shall 
assume that the signal values at the detector are ++(vo) (rather than +(1)) and that the 
filtered noise at the detector has a zero-mean Gaussian pdf with variance o”: 

Equation: 


fo(u) = —~e-# 


and so 
Equation: 
Prierror] = cs fu(u) du 
= [x fu(owodu 
= (2) 
where 
Equation: 


1 ae 
a2) = — | e 2>du 


This integral has no analytic solution, but a good approximation to it exists and is discussed 
in some detail in [link]. 


From [link] we may obtain the probability of error in the binary detector, which is often 
expressed as the bit error rate or BER. For example, if Pr[error] = 2 x 10°, this would 
often be expressed as a bit error rate of 2 x 10°, or alternatively as 1 error in 500 bits (on 
average). 


The argument ( ) in [link] is the signal-to-noise voltage ratio (SNR) at the detector, and 
the BER rapidly diminishes with increasing SNR (see [link]). 


Approximation Formula for the Gaussian Error Integral, Q(x) 


A Gaussian pdf with unit variance is given by: 
Equation: 


The probability that a signal with a pdf given by f(a) lies above a given 
threshold x is given by the Gaussian Error Integral or @ function: 
Equation: 


Qa) = [fu du 


There is no analytical solution to this integral, but it has a simple 
relationship to the error function, erf (x), or its complement, erfc (x), 
which are tabulated in many books of mathematical tables. 

Equation: 


2 - 2 
exf(e) = — | e“ du 
mT JO 


and 
Equation: 
erfc(z) = 1-erf(z) 
= oe lee edu 
Therefore, 


Equation: 


Note that erf (0) = 0 and erf (oo) = 1, and therefore Q(0) = 0.5 and 
Q(x) — 0 very rapidly as x becomes large. 


It is useful to derive simple approximations to Q(x) which can be used on a 
calculator and avoid the need for tables. 


Let v = u — 2, then: 
Equation: 


Q(z) = Jo f(ut+2z) dv 


Now if x >> 1, we may obtain an approximate solution by replacing the 


v2 . . . . . . . ° . 
e 2 term in the integral by unity, since it will initially decay much slower 
than the e~ ‘*) term. Therefore 
Equation: 


x 


e e 2 


Q(x) < = | e) dy= 
V2r J0 V 20x 


This approximation is an upper bound, and its ratio to the true value of 
Q(ax) becomes less than 1.1 only when x > 3, as shown in [link]. We may 
obtain a much better approximation to Q(z) by altering the denominator 


above from ( V2rx) to (1.642 + V0.76x2 + 4) to give: 
Equation: 


a 
wo 


Q(z) ee 
1.642 + 0.7622 + 4 


This improved approximation gives a curve indistinguishable from Q(z) in 
[link] and its ratio to the true Q(z) is now within +(0.3 %) of unity for all 
x > 0 as shown in [link]. This accuracy is sufficient for nearly all practical 
problems. 


10°. 


simple approximation 


Q(a) and the simple approximation of [link]. 


The ration of the improved approximation of Q(z) 
in [link] to the true value, obtained by numerical 
integration. 


Expectation 


Expectations form a fundamental part of random signal theory. In simple 
terms the Expectation Operator calculates the mean of a random quantity 
although the concept turns out to be much more general and useful than just 
this. 


If X has pdf fx (a) (correctly normalised so that [ “ tx(2) de = 1), its 
expectation is given by: 
Equation: 


EX] [oo, efx(a) dx 


= X 


For discrete processes, we substitute this previous equation in here to get 
Equation: 


EX] = 2 Dini px(2i)6(@ — 2) dz 
dint Bip x(zi) 
xX 


| 


Now, what is the mean value of some function, Y = g(X)? 


Using the result of this previous equation for pdfs of related processes Y 
and X: 
Equation: 


fy(y)d(y) = fx(«)d(a) 


Hence (again assuming infinite integral limits unless stated otherwise) 
Equation: 


Elg(X)| = EY] 
= fyfr(y)dy 
JS 9(2)fx(x) da 


This is an important result which allows us to use the Expectation Operator 
for many purposes including the calculation of moments and other related 
parameters of a random process. 


Note, expectation is a Linear Operator: 
Equation: 


Elagi(X) + bgo(X)] = aE lgi(X)] + bE|g2(X)] 


Important Examples of Expectation 


We get Moments of a pdf by setting g(/X) = X” in this previous equation, 
Equation: 


nth order moment 


E|X"] = f 2° Fx(2) dz 


e n = 1: Ist order moment, |x] = Mean value 
e nm = 2: 2nd order moment, / [x?| = Mean-squared value (Power or 


energy) 
e n > 2: Higher order moments, E[x”], give more detail about fx(z). 


Central Moments 
Central moments are moments about the centre or mean of a distribution, 


Equation: 
nth order central moment 


e(o-3) = (3) me 


Some important parameters from central moments of a pdf are: 


e Variance, n = 2: 
Equation: 


lle 


fi € - x) noe 


= E[X?] -2(x) ne (x) 
E[xX?] - (x) 


Standard deviation, 0 = / variance. 
Skewness, n = 3: 
Equation: 


y = Oif the pdf of X is symmetric about Xx, and becomes more 


positive if the tail of the distribution is heavier when X > _X. 
Kurtosis (or excess), n = 4: 
Equation: 


«& = 0 for a Gaussian pdf and becomes more positive for distributions 


with heavier tails. 


{ 2*fx(z) dz—2Xx f afx(z)da+ (x) f fx(a) da 


Note:Skewness and kurtosis are normalized by dividing the central 
moments by appropriate powers of o to make them dimensionless. 
Kurtosis is usually offset by —3 to make it zero for Gaussian pdfs. 


Example: Central Moments of a Normal Distribution 


The normal (or Gaussian) pdf with zero mean is given by: 
Equation: 


1 a 
fx(a) — E20 
V 2102 


What is the nth order central moment for the Gaussian? 


Since the mean is zero, the nth order central moment is given by 
Equation: 


E[X"| = fa"fx(e)de 


a2 
= u fae"e wdzx 


Vv 2102 


fx(a) is a function of x? and therefore is symmetric about zero. So all the 
odd-order moments will integrate to zero (including the Ist-order moment, 
giving zero mean). The even-order moments are then given by: 

Equation: 


2 m 2 
E|X") = / ge 2 daz 
V2ro2 Jo 


2 
where 7 is even. The integral is calculated by substituting u = >— to give: 
Equation: 
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oS ee 
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Here I"(z) is the Gamma function, which is defined as an integral for all 
real z > O and is rather like the factorial function but generalized to allow 
non-integer arguments. Values of the Gamma function can be found in 
mathematical tables. It is defined as follows: 

Equation: 


C(z)= i ute“du 
0 


and has the important (factorial-like) property that 
Equation: 


Vz,z4#0: (1(z4+1) = zI(z)) 
Equation: 


Vz,zEZ A (z>0): (F(z4+1) = 2) 


The following results hold for the Gamma function (see below for a way to 
evaluate I’ (3) etc.): 
Equation: 


Equation: 


and hence 


Equation: 


Ps 2 
Equation: 
r(2)=1 
Hence 
Equation: 
0 if n = odd 


E[|X") = | 4 
JT 
e Valid pdf, n = 0: 
Equation: 


(20?) * (ay 


E|X° 
as required for a valid pdf. 


Note: The normalization factor —~ in 
/ Ino? 


) if n = even 


the expression for the pdf of 


a unit variance Gaussian (e.g. [link]) arises directly from the above 


result. 


Mean, n = 1: 
Equation: 


so the mean is zero. 
e Variance, n = 2: 
Equation: 


Therefore standard deviation = J variance = o. 
e Skewness, n = 3: 
Equation: 


E[X*] =0 
so the skewness is zero. 


e Kurtosis, n = 4: 
Equation: 


Hence 
Equation: 


Evaluation of the Gamma Function 


From the definition of I and substituting u = «?: 


Equation: 


i 
—~ 
tle 


yeaa uve “du 
= fo ate" ted 
= 24, eda 
ie e? dg 
Using the following squaring trick to convert this to a 2-D integral in polar 


coordinates: 
Equation: 


a 
bo 
eves 
tole 
——— 
I| 


er dae dy 
= ge dad y 
f 


iy Jove rdrdé 


Tv 


i (-4 er 


= FW 


ove) 
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and so (ignoring the negative square root): 
Equation: 


1 = 
r(3) = Vn ~ 1.7725 


Hence, using I'(z + 1) = zI(z): 
Equation: 


3 5 7 9 De ges Be ee 5 105 
fl gene ac ee em a)D SE pty 7 SRY Pe RADY 
(Gees \) {svn gv va O° Va } 


The case for z = 1 is straightforward: 


Equation: 
RL) eee ite Vda 
— =e le 
ae 
SO 
Equation: 


I'({2,3,4,5,...}) = {1, 2,6, 24,...} 


Sums of Random Variables 


Consider the random variable Y formed as the sum of two independent 
random variables X; and X9: 
Equation: 


Y=X,+ Xe 


where X, has pdf f1(#1) and X2 has pdf fo(x2). 


We can write the joint pdf for y and x, by rewriting the conditional 
probability formula: 
Equation: 


Fly, 1) = f(yl@1) fr (a1) 


It is clear that the event 'Y takes the value y conditional upon X; = xj’ is 
equivalent to X2 taking a value y — x, (since X2 = Y — X;1). Hence 
Equation: 


f(ylzi) = foly — 21) 
Now f(y) may be obtained using the Marginal Probability formula (this 


equation from this discussion of probability density functions). Hence 
Equation: 


fly) = — f(ylai)fi(ai) d x1 
= fely— 21) fi(ai) d x1 
= fo*fi 


This result may be extended to sums of three or more random variables by 
repeated application of the above arguments for each new variable in turn. 
Since convolution is a commutative operation, for n independent variables 
we get: 


Equation: 


f(y) = ag Go era Pa 
= te fast ee do i 


An example of this effect occurs when multiple dice are thrown and the 
scores are added together. In the 2-dice example of the subfigures a,b,c of 
this figure in the discussion of probability distributions, we saw how the 
pmf approximated a triangular shape. This is just the convolution of two 
uniform 6-point pmfs for each of the two dice. 


Similarly if two variables with Gaussian pdfs are added together, we shall 
show in the discussion of the summation of two or more Gaussian random 
variables that this produces another Gaussian pdf whose variance is the sum 
of the two input variances. 


Characteristic Functions 


You have already encountered the Moment Generating Function of a pdf in the Part IB probability course. This 
function was closely related to the Laplace Transform of the pdf. 


Now we introduce the Characteristic Function for a random variable, which is closely related to the Fourier 
Transform of the pdf. 


In the same way that Fourier Transforms allow easy manipulation of signals when they are convolved with linear 
system impulse responses, Characteristic Functions allow easy manipulation of convolved pdfs when they 
represent sums of random processes. 


The Characteristic Function of a pdf is defined as: 
Equation: 


@x(u) = Ele™| 
= — ett fx (x) dz 
= F(-u) 


where Y(u) is the Fourier Transform of the pdf. 
Note that whenever fx is a valid pdf, 6(0) = f fx(z)dz=1 
Properties of Fourier Transforms apply with —u substituted for w. In particular: 


e Convolution - (sums of independent rv's) 
Equation: 


N N 
(v = x Jo (fy = fx, *fx,*-:-* fay) > (#00 = H«0) 


t=1 


e Inversion 


Equation: 
1 
flo) = 5 f e¥x(u) du 
20 
e Moments 
Equation: 


du” 


( a” bx(w) = (ia)"e™* fale) a) => (2x = x” fx(a) da = ar Px(u) 


¢ Scaling If Y = aX, fy(y) = ta) from this equation in our previous discussion of functions of random 


variables, then 
Equation: 


$y(u) = fefy(y)d 
} eit fy (x) dz 
= @x(au) 


Characteristic Function of a Gaussian pdf 


The Gaussian or normal distribution is very important, largely because of the Central Limit Theorem which we 
shall prove below. Because of this (and as part of the proof of this theorem) we shall show here that a Gaussian pdf 
has a Gaussian characteristic function too. 


A Gaussian distribution with mean y and variance o? has pdf: 
Equation: 


Its characteristic function is obtained as follows, using a trick known as completing the square of the exponent: 
Equation: 


Sx(u) = Ble] 
= fe"fx(x) da 


x? —2ya+p?—207iua 


= 1 a oa 
~ /2mo? fe =f a 


(0-p+iua?)? 


= 1 ear TS ae ay 
PEN Sige ee EE 


ure? 


= ee? 


since the integral in brackets is similar to a Gaussian pdf and integrates to unity. 

u2o2 
Thus the characteristic function of a Gaussian pdf is also Gaussian in magnitude, e “2 , with standard deviation 
4, and with a linear phase rotation term, e““, whose rate of rotation equals the mean p of the pdf. This coincides 
with standard results from Fourier analysis of Gaussian waveforms and their spectra (e.g. Fourier transform of a 
Gaussian waveform with time shift). 


Summation of two or more Gaussian random variables 


If two variables, X; and X»2 , with Gaussian pdfs are summed to produce X, their characteristic functions will be 
multiplied together (equivalent to convolving their pdfs) to give 
Equation: 


Px(u) = Hx, (u)fy,(u) 


7 u? o42+092 
Sentai) go Oe) 


This is the characteristic function of a Gaussian pdf with mean ( j1 + jig) and variance (01? + 22). 


Further Gaussian variables can be added and the pdf will remain Gaussian with further terms added to the above 
expressions for the combined mean and variance. 


Central Limit Theorem 


The central limit theorem states broadly that if a large number N of independent random variables of arbitrary pdf, 
but with equal variance o” and zero mean, are summed together and scaled by a to keep the total energy 
independent of NN, then the pdf of the resulting variable will tend to a zero-mean Gaussian with variance o7 as N 
tends to infinity. 


This result is obvious from the previous result if the input pdfs are also Gaussian, but it is the fact that it applies 
for arbitrary input pdfs that is remarkable, and is the reason for the importance of the Gaussian (or normal) pdf. 
Noise generated in nature is nearly always the result of summing many tiny random processes (e.g. noise from 
electron energy transitions in a resistor or transistor, or from distant worldwide thunder storms at a radio antenna) 
and hence tends to a Gaussian pdf. 


Although for simplicity, we shall prove the result only for the case when all the summed processes have the same 
variance and pdfs, the central limit result is more general than this and applies in many cases even when the 
variance and pdfs are not all the same. 


Proof: 


Let X; (¢ = 1 to N) be the N independent random processes, each will zero mean and variance o”, which are 
combined to give 


Equation: 
=>? 
xX = — X; 
VN iH 


Then, if the characteristic function of each input process before scaling is #(w) and we use [link] to include the 
scaling by aie the characteristic function of X is 


Equation: 
Px(u) = Ts $xi (5) 
= (79) 
Taking logs: 
Equation: 


log  x(u) = Wlog#(— ) 


U 


VN 


Using Taylor's theorem to expand o( ) in terms of its derivatives at wu = 0 (and hence its moments) gives 


Equation: 


(Fy) #4 pa a (Ge) Satay) a tae (Gq) a 


From the Moments property of characteristic functions with zero mean: 


e valid pdf 


e zero mean 


e variance 
e scaled skewness 


e scaled kurtosis 
6(0)* = i*E[X;"] = (« + 3)o* 
These are all constants, independent of NV, and dependent only on the shape of the pdfs f.x,. 


Substituting these moments into [link] and [link] and using the series expansion, log (1 + 2) = x + (terms of 
order x? or smaller), gives 
Equation: 


log &x(u) = Nlog$(--) 


- Nlog (1 - #70? + **) 
= w(-S¢ +") 
= 8p + Ht 


3 1 
where ** represents the terms of order N~? or smaller and ## represents the terms of order N~ 2 or smaller. As 
N > o, 


ua? 


2 


log ®x(u) > — 


Therefore, as N — oo 
Equation: 


u2o2 


@x(u) +e 2 
Note that, if the input pdfs are symmetric, the skewness will be zero and the error terms will decay as N ~‘ rather 
than N~?; and so convergence to a Gaussian characteristic function will be more rapid. 


Hence we may now infer from [link], [link] and [link] that the pdf of X as N — co will be given by 
Equation: 


Thus we have proved the required central limit result. 


[link] shows an example of convergence when the input pdfs are uniform, and N is gradually increased from 1 to 
50. By N = 12, convergence is good, and this is how some 'Gaussian' random generator functions operate - by 
summing typically 12 uncorrelated random numbers with uniform pdfs. 


For some less smooth or more skewed pdfs, convergence can be slower, as shown for a highly skewed triangular 
pdf in [link]; and pdfs of discrete processes are particularly problematic in this respect, as illustrated in [link]. 


Convergence toward a Gaussian pdf (Central Limit Theorem) for 3 different input pdfs for N = 1 to 50. Note 
that the uniform pdf (a) with smallest higher-order moments converges fastest. Curves are shown for 
N= {1,2,3,4,6,8, 10,12, 15.20, 30,50}, 


(a) Uniform pdf, N = 50 
T 7 T 
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Random Processes 


We discussed Random Signals briefly and now we return to consider them 
in detail. We shall assume that they evolve continuously with time ¢, 
although they may equally well evolve with distance (e.g. a random texture 
in image processing) or some other parameter. 


We can imagine a generalization of our previous ideas about random 
experiments so that the outcome of an experiment can be a "Random 
Object’, an example of which is a signal waveform chosen at random from a 
set of possible signal waveforms, which we term an Ensemble. This 


ensemble of random signals is known as a Random Process. 


Ensemble of Random Signals 
1 


' 
X(t,@;) : 
1 


1 

i 

! 

! 
: X(t. Qj41) : 


t; to = t—- 


Ensemble representation of a random process. 


An example of a Random Process X(t, a) is shown in [link], where ¢ is 
time and a is an index to the various members of the ensemble. 


e tis assumed to belong to some set (the time axis). 

¢ ais assumed to belong to some set (the sample space). 

e If is acontinuous set, such as R or (0, 00), then the process is termed 
a Continuous Time random process. 

e If isa discrete set of time values, such as the integers Z, the process 
is termed a Discrete Time Process or Time Series. 

e The members of the ensemble can be the result of different random 
events, such as different instances of the sound ‘ah' during the course 
of this lecture. In this case a is discrete. 

e Alternatively the ensemble members are often just different portions of 
a single random signal. If the signal is a continuous waveform, then a 
may also be a continuous variable, indicating the starting point of each 
ensemble waveform. 


We will often drop the explicit dependence on a for notational convenience, 
referring simply to random process { X(t) }. 


If we consider the process { X(t)} at one particular time t = ty, then we 
have a random variable X(t). 


If we consider the process { X(t)} at N time instants {t1, to,...,¢}, then 
we have a random vector: 


X= (X(ti) X(t) ... X(tw))” 


We can study the properties of a random process by considering the 
behavior of random variables and random vectors extracted from the 
process, using the probability theory derived earlier in this course. 


Correlation and Covariance 


Correlation and covariance are techniques for measuring the similarity of one 
signal to another. For a random process X(t, a) they are defined as follows. 


e Auto-correlation function: 
Equation: 


rxx(ti,t2) = E[X(t1,a)X(te,a)| 
= f f eitof(x1,22) da, d zo 


where the expectation is performed over all a € (i.e. the whole 
ensemble), and f(x1, x2) is the joint pdf when x; and x2 are samples 
taken at times ¢t; and tz from the same random event a of the random 
process X. 

e Auto-covariance function: 
Equation: 


exx(t1,t2) = B| (x(t,2) - x(t)) (x(t2,0) - x(ta))| 
ff @ = X(t) (2 = X(t) f(x1,22) da day 


— rxx(t1, te) = 2X (t1)X(t2) i X (t1)X (t2) 
= rxx(ti,te) — X(t1)X (te) 


where the same conditions apply as for auto-correlation and the means 


X(t,) and X(t2) are taken over all @ € . Covariances are similar to 
correlations except that the effects of the means are removed. 

¢ Cross-correlation function: If we have two different processes, X(t, a) 
and Y(t, a), both arising as a result of the same random event a, then 
cross-correlation is defined as 
Equation: 


rxy(ti,te) = E[X(t1,a)Y(te, a) 
J f c1yef(21,y2) d x1 d yo 


where f(2£1, y2) is the joint pdf when x; and yp are samples of X and Y 
taken at times ¢; and ¢2 as a result of the same random event a. Again 
the expectation is performed over alla € . 

e Cross-covariance function: 
Equation: 


cxy (t1, te) 


E (xt, ay X(t) (vit, aye ¥(ts))| 
ei (2 - x(a) (vs = ¥ (ta)) flere) dai dy 


rxy (ti, t2) — X(t1)Y (t2) 


For Deterministic Random Processes which depend deterministically on the 
random variable a (or some function of it), we can simplify the above integrals 
by expressing the joint pdf in that space. E.g. for auto-correlation: 

Equation: 


rxx(ti,t2) = E[X(t1,a)X(t2, a)| 
f x(t1,a)x(te,a)f(a)da 


Stationarity 


Stationarity in a Random Process implies that its statistical characteristics 
do not change with time. Put another way, if one were to observe a 
stationary random process at some time ¢ it would be impossible to 
distinguish the statistical characteristics at that time from those at some 
other time t’. 


Strict Sense Stationarity (SSS) 


Choose a Random Vector of length NV from a Random Process: 
Equation: 


X=(X(ti) X(te) ... X(tn))* 
Its Nth order cdf is 
Equation: 


Fx(t),... X(ty (€1) +--+) ©) = Prl{ X(t) < a1,...,X(tw) < ey} 


X(t) is defined to be Strict Sense Stationary iff: 


Equation: 
PRG) a KEM Ete ON) = FXG KGS Hien) 
for all time shifts c, all finite N and all sets of time points {t1, ..., ty}. 


Wide Sense (Weak) Stationarity (WSS) 


If we are only interested in the properties of moments up to 2nd order 
(mean, autocorrelation, covariance, ...), which is the case for many practical 
applications, a weaker form of stationarity can be useful: 


X(t) is defined to be Wide Sense Stationary (or Weakly Stationary) iff: 


1. The mean value is independent of ¢, for all t 


Equation: 
E|X(t)| = 
2. Autocorrelation depends only upon T = f2 — fj, for all ty 
Equation: 
E(X(t)X(t2)) = E|X(t) X(t +7) 
= THHT) 


Note that, since 2nd-order moments are defined in terms of 2nd-order 
probability distributions, strict sense stationary processes are always wide- 
sense stationary, but not necessarily vice versa. 


Ergodicity 


Many stationary random processes are also Ergodic. For an Ergodic 
Random Process we can exchange Ensemble Averages for Time 
Averages. This is equivalent to assuming that our ensemble of random 
signals is just composed of all possible time shifts of a single signal X(t). 


Recall from our previous discussion of Expectation that the expectation of a 
function of a random variable is given by 
Equation: 


Elg(X)] = / gla\fx(2) dz 


This result also applies if we have a random function g(.) of a 
deterministic variable such as t. Hence 
Equation: 


Elg(t)| = / g(t) fir(t) dt 


Because t is linearly increasing, the pdf f(t) is uniform over our 
measurement interval, say —T to 7’, and will be av to make the pdf valid 
(integral = 1). Hence 

Equation: 


Elg(t)] 


T 
{79 sr dt 
se frp g(t) dt 


If we wish to measure over all time, then we take the limit as T’ > oo. 
This leads to the following results for Ergodic WSS random processes: 


e Mean Ergodic: 


Equation: 


E[X(t)] J25 efxq)(z) dx 


= limit gp fxd 


¢ Correlation Ergodic: 
Equation: 


rxx(r) = E[X(t)X(t+7)| 
= ee ‘jee ©1 02 fx(t),X(t+7) (#1, 22) d 21 d ro 


limit yp (ee xX@OxGe rd 
00 


and similarly for other correlation or covariance functions. 


Ergodicity greatly simplifies the measurement of WSS processes and it 
is often assumed when estimating moments (or correlations) for such 
processes. 


In almost all practical situations, processes are stationary only over some 
limited time interval (say 7 to T>) rather than over all time. In that case 
we deliberately keep the limits of the integral finite and adjust f X(t) 
accordingly. For example the autocorrelation function is then measured 
using 

Equation: 


1 T) 


-——__ |  X(t)X(t+7r)dt 
Bot Ip (t) X(t + 7) 


rxx(T) 


This avoids including samples of X which have incorrect statistics, but it 
can suffer from errors due to limited sample size. 


Spectral Properties of Random Signals 


Relation of Power spectral Density to ACF 


The autocorrelation function (ACF) of an ergodic random signal tells us how correlated 
the signal is with itself as a function of time shift 7. In particular, for 7 = 0 
Equation: 


rxx(0) = limit gp f’,X?(t) dt 


= mean power of X(t) 


Note that if Z’ —> oo, for all 7 
Equation: 


rxx(T) =rxx(-T) < rxx(0) 


As T becomes large, X(t) and X(¢ + 7) will usually become decorrelated and, as long 
as X is zero Mean, rx x will tend to zero. 


Hence the ACF will have its maximum at 7 = 0 and decay symmetrically to zero (or 
to ”, if w ~ 0) as |7| increases. 


The width of the ACF (to say its half-power points) tells us how slowly X is fluctuating 
or how band-limited it is. [link] shows how the ACF of a rapidly fluctuating (wide- 
band) random signal, as in [link] upper plot, decays quickly to zero as |r| increases, 
whereas, for a slowly fluctuating signal, as in [link] lower plot, the ACF decays much 
more slowly. 


Illustration of the different properties of wide band (upper) and narrow band 
(lower) random signals: (a) the signal waveforms with unit variance; (b) their 
autocorrelation functions (ACFs); and (c) their power spectral densities (PSDs). In 
(b) and (c), the thin fluctuating curves shows the actual values measured from 4000 
samples of the random waveforms while the thick smooth curves show the limits 
of the ACF and PSD as the lengths of the waveforms tend to infinity. 


(a) Random signals with different bandwidths 


Wide bandwidth 


Narrow bandwidth 
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(b) ACF of signals 
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(c) PSD of signals 
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The ACF measures an entirely different aspect of randomness from amplitude 
distributions such as pdf and cdf. 


As with deterministic signals, we may formalize our ideas of rates of fluctuation by 
transforming to the Frequency (Spectral) Domain using the Fourier Transform: 
Equation: 


Fy(w) FT (u(t)) 


f ue“) dt 


The Power Spectral Density (PSD) of a random process X is defined to be the Fourier 
Transform of its ACF: 
Equation: 


Sx(w) = FT (rxx(7)) 
= f rxx(r)e") dr 


Equation: 


rxx(T) — FT !(Sx(w)) 
= +f Sx(w)e™’ dw 


N.B. {X(t)} must be at least Wide Sense Stationary (WSS). 


From [link] and [link] we see that the mean signal power is given by: 
Equation: 


rxx(0) = =f Sx(w)dw 
= J Sx(2rf) df 
Hence S'x has units of power per Hertz. Note that we must integrate over all 
frequencies, both positive and negative, to get the correct total power. 
[link] shows how the PSDs of the signals relate to the ACFs in [link]. 
Properties of PSDs for real-valued X(t): 
iF Sx(w) = Sx(—w) 


2. Sx(w) is Real-valued 
2: Sx(w) > 0 


Properties 1 and 2 are because ACFs are real and symmetric about tT = 0; and 3 is 
because Sx represents power density. 


Linear system (filter) with WSS input 


Time domain 
h(t) 


X(t), rxx(r) races Y(t) = A(t) * X(t), rvv(7) 
Sx(w) —_— Sy (w) = [H(w)P Sx(w) 
H(w) 


Frequency domain 


Block diagram of a linear system with a random 
input signal, X(t). 


Let the linear system with input X(t) and output Y(t) have an impulse response h(t), 
SO 


Equation: 
YQ). = A)* XG) 
= fh(a)xX(t—a)da 
Then the ACF of Y is 
Equation: 
ryy(ti,t2) = E[Y(ti)¥(¢2)| 

= Elf h(ai)X(ti — a1) day f h(az)X (tz — a) d ag] 
= Elf f h(ay)h(a2)X (ti = ay1)X(te = a2) d a1 d a2 
= f f h(ar)h(a2)E[X(t — a1) X(t2 — a2)] dar daz 
= f f h(ay)h(a2)rxx(ti — a1,t2 — Q2) da; dag 

If X is WSS then 


Equation: 


EIY(t)¥(t +7) 
J f h(ar)h(a2)rxx(r +01 — a2) dai daz 
rxx(T)*h(—7T)*h(r) 


TYY (7) 


Taking Fourier transforms: 
Equation: 


Sy (w) 


FT (ryy(v)) 

= ff f h(a)h(a2)rxx(7 +01 — a2) day dae” dr 
= ff h(ai)R(a2) f rxx(7 + a1 - are“) drda, day 
= ff h(ai)h(a2) f rxx(Aje WO-m+) dX dai daz 

= fh(aije’™ day f h(az)e"™) daz f rxx(Aje™) dr 
FE (w) HC (w) Sx(w) 


where #(w) = FT (A(t)). i.e: 
Equation: 


Sy(w) = (|¥(w)|)"Sx(w) 


Hence the PSD of Y = the PSD of X x the power gain (|.#|)” of the system at 
frequency w. 


Thus if a large and important system is subject to random perturbations (e.g. a power 
plant subject to random load fluctuations), we may measure ry x(7T) and ryy(7), 
transform these to S'x(w) and Sy(w), and hence obtain 


Equation: 
x= [9 


Hence we may measure the system frequency response without taking the plant off 
line. But this does not give any information about the phase of #(w). 


However, if instead we measure the Cross-Correlation Function (CCF) between X 
and Y, we get: 
Equation: 


rxy(ti,t2) = E[X(t)¥(t2)| 
= E|X(ti) f h(a2)X(t2 — a2) d ag| 
= Elf h(a2)X(t1)X(t2 — a2) d ag] 
= f h(a) E[X(t1)X(tz — a2)] d ay 
= f h(a2)rxx(ti, tz — a2) dag 


If X(t), and hence Y(t), are WSS: 
Equation: 
rxy(r) = E/X(t)Y(t+7)| 
J h(a)rxx(7-—a) da 


h(r)*rxx(tT) 


and taking Fourier transforms: 
Equation: 


Sxy(w) = FT (rxy(t)) 
FE (w)Sx(w) 


where S'xy(w) is known as the Cross Spectral Density between X and Y. Therefore, 
Equation: 


= Sxy(w) 


oon Sx(w) 


Hence we obtain the amplitude and phase of .#(w). As before, this is achieved 
without taking the plant off line. 


Note that for WSS processes, rxy(T) = ryx(—7T) and that (unlike ryx and ryy) these 
need not be symmetric about 7 = 0. Hence the cross spectral density Sxy(w) need not 
be purely real (unlike Sx(w)), and the phase of Sxy(w) gives the phase of #(w). 


Physical Interpretation of Power Spectral Density 


Narrowband filter frequency response and PSD of filter 
input and output. 


Let us pass X(t) through a narrow-band filter of bandwidth 6(w) = 276(f), as shown 
in [link]: 

Equation: 

1 if wo < |w| < wo + 6(w) 

0 otherwise 


ww) =| 


Find average power at the filter output (shaded area in [link], divided by 277): 
Equation: 


Py = ryy(0) 
= ¢ f™ Sy(w) dw 
= S23 Sx(w)(\%(w)|)° dw 
=> Gee Sx(w) dw+ es Sx(w) d w) ~ 2S'x(wo) X ¢6(wo) 


_ (wo+6(wo)) 


since Sx(—w) = Sx(w). Expressed in terms of fo = 5°: 
Equation: 


Po ~ 2S'x(27fo)6(f) 


with the factor of 2 appearing because our filter responds to both negative and positive 
frequency components of X. 


2 
Hence S'x is indeed a Power Spectral Density with units te (assuming unit 
impedance). 


White and Coloured Processes 


White Noise 


If we have a zero-mean Wide Sense Stationary process X, it is a White 
Noise Process if its ACF is a delta function at 7 = 0, i.e. it is of the form: 
Equation: 


rxx(T) = Px6(r) 


where Px is a constant. 


The PSD of X is then given by 
Equation: 
Sx(w) = f Px6(r)e~ (7) dt 
Pye‘) 
Px 


Hence _X is white, since it contains equal power at all frequencies, as in 
white light. 


Px is the PSD of X at all frequencies. 


But: 
Equation: 
Power of X = =~ f[™ Sx(w) dw 
=: 


so the White Noise Process is unrealizable in practice, because of its infinite 
bandwidth. 


However, it is very useful as a conceptual entity and as an approximation to 
‘nearly white’ processes which have finite bandwidth, but which are 'white' 
over all frequencies of practical interest. For 'nearly white' processes, 
rxx(r) is a narrow pulse of non-zero width, and S'x(w) is flat from zero up 
to some relatively high cutoff frequency and then decays to zero above that. 


Strict Whiteness and i.i.d. Processes 


Usually the above concept of whiteness is sufficient, but a much stronger 
definition is as follows: 


Pick a set of times {tj, t2,...,¢} to sample X(t). 


If, for any choice of {t1,t2,...,¢~} with N finite, the random variables 
X(t1), X(t2),... X(t) are jointly independent, i.e. their joint pdf is 
given by 

Equation: 


N 
fx(t)),X(t2),... X(tw) (1, Za,--- fn) = I] fx) (2) 
i=1 


and the marginal pdfs are identical, i.e. 
Equation: 


fxn) = Fx(tr) 


Fx(tw) 
fx 


then the process is termed Independent and Identically Distributed 
(i.i.d). 


If, in addition, fx is a pdf with zero mean, we have a Strictly White Noise 
Process. 


An i.i.d. process is 'white' because the variables X(t;) and X(t;) are jointly 
independent, even when separated by an infinitesimally small interval 
between ¢; and t;. 


Additive White Gaussian Noise (AWGN) 


In many systems the concept of Additive White Gaussian Noise (AWGN) 
is used. This simply means a process which has a Gaussian pdf, a white 
PSD, and is linearly added to whatever signal we are analysing. 


Note that although 'white' and Gaussian' often go together, this is not 
necessary (especially for 'nearly white’ processes). 


E.g. a very high speed random bit stream has an ACF which is 
approximately a delta function, and hence is a nearly white process, but its 
pdf is clearly not Gaussian - it is a pair of delta functions at +(V) and —V, 
the two voltage levels of the bit stream. 


Conversely a nearly white Gaussian process which has been passed through 
a lowpass filter (See next section) will still have a Gaussian pdf (as it is a 
summation of Gaussians) but will no longer be white. 


Coloured Processes 


A random process whose PSD is not white or nearly white, is often known 
as a coloured noise process. 


We may obtain coloured noise Y(t) with PSD Sy(w) simply by passing 
white (or nearly white) noise X(t) with PSD Px through a filter with 
frequency response #(w), such that from this equation from our 
discussion of Spectral Properties of Random Signals. 

Equation: 


Sy(w) = Sx(w)(|¥(w)|)° 
| 


Hence if we design the filter such that 
Equation: 


then Y(t) will have the required coloured PSD. 


For this to work, Sy (w) need only be constant (white) over the passband of 
the filter, so a nearly white process which satisfies this criterion is quite 
satisfactory and realizable. 


Using this equation from our discussion of Spectral Properties of Random 
Signals and [link], the ACF of the coloured noise is given by 
Equation: 


pyr) = Fee 


| 
vU 
a 
—~ 
——~ 
cLay 
* 
= 
| 
vy 
* 
= 
“~~ 
< 


where h(7) is the impulse response of the filter. 


This Figure from previous discussion shows two examples of coloured 
noise, although the upper waveform is more 'nearly white’ than the lower 
one, as can be seen in part c of this figure from previous discussion in 
which the upper PSD is flatter than the lower PSD. In these cases, the 
coloured waveforms were produced by passing uncorrelated random noise 
samples (white up to half the sampling frequency) through half-sine filters 
(as in this equation from our discussion of Random Signals) of length 

T, = 10 and 50 samples respectively. 


